A master equation for power laws

We propose a new mechanism for generating power laws. Starting from a random walk, we first outline a simple derivation of the Fokker–Planck equation. By analogy, starting from a certain Markov chain, we derive a master equation for power laws that describes how the number of cascades changes over time (cascades are consecutive transitions that end when the initial state is reached). The partial differential equation has a closed form solution which gives an explicit dependence of the number of cascades on their size and on time. Furthermore, the power law solution has a natural cut-off, a feature often seen in empirical data. This is due to the finite size a cascade can have in a finite time horizon. The derivation of the equation provides a justification for an exponent equal to 2, which agrees well with several empirical distributions, including Richardson’s Law on the size and frequency of deadly conflicts. Nevertheless, the equation can be solved for any exponent value. In addition, we propose an urn model where the number of consecutive ball extractions follows a power law. In all cases, the power law is manifest over the entire range of cascade sizes, as shown through log–log plots in the frequency and rank distributions.

We propose a new mechanism for generating power laws. Starting from a random walk, we first outline a simple derivation of the Fokker-Planck equation. By analogy, starting from a certain Markov chain, we derive a master equation for power laws that describes how the number of cascades changes over time (cascades are consecutive transitions that end when the initial state is reached). The partial differential equation has a closed form solution which gives an explicit dependence of the number of cascades on their size and on time. Furthermore, the power law solution has a natural cutoff, a feature often seen in empirical data. This is due to the finite size a cascade can have in a finite time horizon. The derivation of the equation provides a justification for an exponent equal to 2, which agrees well with several empirical distributions, including Richardson's Law on the size and frequency of deadly conflicts. Nevertheless, the equation can be solved for any exponent value. In addition, we propose an urn model where the number of consecutive ball extractions follows a power law. In all cases, the power law is manifest over the entire range of cascade sizes, as shown through log-log plots in the frequency and rank distributions.

Introduction
A power law is a nonlinear relationship between two quantities x and y that can be modelled generically by the following formula: y = ax k , where k and a are constants, respectively, the exponent of the power law, and the width of the scaling relationship. The power law is scale-invariant relationship f(xc) ∝ f(x) that holds for any value of c. Graphically, it implies that the curve describing the relationship between x and y maintains its shape under any possible dilatation. Moreover, power laws can be represented as straight lines on a log-log plot, called the signature of the power law.
The signature can be employed to analyse empirical data by comparing the distribution of the data on a log-log plot with the best fitting power law. Probably, this representation was first introduced by J. C. Wills in 1922 to plot the distribution of the number of species in a genus [1]. Many lognormal relationships appear as power laws when plotted for small ranges. As a consequence, to assert that a relationship between two variables is a power law, it should hold for at least two orders of magnitude [2].
In the next section, we enumerate some representative examples of power law distributions from physics, biology, ecology and social sciences. Furthermore, we list the most well-known mechanisms for generating power laws. In the third section, we proceed to propose a Markov process model for generating power laws. First, we outline a derivation of the Fokker-Planck equation by starting from a random walk and taking a continuum limit from a discrete state space. This is a well-known approach for deriving the Fokker-Planck equation. We provide our own treatment of this procedure by changing the difference operator Δ acting on discrete functions to the differential operator d that acts on real variables. We then propose a Markov chain that displays a power law in its long-term equilibrium distribution. Based on this and in analogy with the Fokker-Planck equation derivation, we obtain a master equation for power laws that we simplify to an analytically tractable form. Surprisingly, the solution can be written in closed form. Afterwards, in the fourth section, we propose an urn model where the number of cascades (consecutive balls drawn) scales according to a power law. This improves on an urn model by Brunk [3] that displays a power law relationship only in a limited range. Finally, we summarize our results in the Conclusion.

Background
The main challenge regarding power law distributions is understanding their origin. Broadly speaking, there are two ways power laws can emerge: probability transformations and generative processes. Probability transformations derive power law distributions from other distributions. Generative processes are algorithms that create new distributions. Here, we provide more details on generative processes, because our own work is of this nature. To the best of our knowledge, there does not exist a comprehensive review of all the power law distribution generative models (even if [1] finds a large set), and it is not in the scope of this paper to provide one.
There are three main strategies to obtain a power law distribution by transforming an existing probability distribution: combining exponential distributions, inverting quantities or looking at the extreme values of distributions.
In the first case, the transformation consists of substituting the variable of an exponential distribution with another exponential distribution. The resulting distribution follows a power law, in which parameters depend on the original exponential distribution. The distribution of sizes of populations that grow exponentially (i.e. with infinite carrying capacity) and can suddenly become extinct at any time step with the same probability follows a power law that can be derived from this model [4].
The second case requires the inversion of the quantities of the probability distribution. If these quantities pass through zero, the transformation results in a power law distribution with exponent equal to 2, resulting from the derivation of an inverted variable. The most notable example of a power law distribution obtained in this way occurs in the paramagnetic phase of the Ising model [5,6].
The third way originates from the extreme value theorems which provide results on the asymptotic behaviour of the extreme realizations. The Pickands-Balkema-de Haan theorem [7,8] states that the conditional distribution of a random variable above a certain threshold tends to a Pareto distribution when the threshold tends to infinity. Therefore, it provides a rationale for the widespread observation of Pareto or power law behaviour, since it is the limiting behaviour of large events for a whole class of probability distributions [9]. Analogously, the Fisher-Tippett-Gnedenko theorem [10,11] describes the possible distributions that the maximum value can have; this can shed light in cases where a simple power law does not appear from the application of the Pickands-Balkema-de Haan theorem [12].
In addition, we identified in the literature at least four notable generative processes for power law distributions: phase transitions (along with self-organized criticality), random walks, sample-space reduction processes and the Yule processes.
The idea underlying phase transitions and critical phenomena is simple but powerful. In systems governed by a single length scale, the scale can diverge in some given conditions, giving birth to scale-free systems in which quantities are distributed as a power law. The precise point at which the length scale diverges is called a critical continuous point or phase transition. Nevertheless, it is unlikely that the parameters that regulate the phase transition of a real-world system happen to fall on that specific value. So, the existence of critical phenomena is not enough to explain the presence of power law distributions in many natural and social systems. But some systems appear to self-organize to lay close to critical points, independently from the initial conditions. This phenomenon, called self-organized criticality, can be royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 220531 exemplified by the classic sandpile model [13], which generates avalanches whose sizes are distributed according to a power law. Other examples of how self-organized criticality generates power law distributions includes earthquakes [14], neuronal avalanches [15] and forest fires [1,16]. However, the emergence and applicability of self-organized criticality is debatable [17].
Random walks are a succession of randomly generated steps on a given space [18]. The statistical properties of random walks tend toward universal distributions when the steps are independent of each other, and their number grows unbounded [19]. As a consequence, they have the ability to generate power law distributions even when the underlying rules are very simple. One of the most famous examples (and probably the simplest) relates to the first-return time. It can be proved that, under specific conditions, the distributions of the first-return times follows a power law [1,19].
The sample-space reduction process models history-dependent systems, such as the formation of sentences, whose number of possible different meanings is progressively reduced the more words are added to it. Corominas-Murtra et al. demonstrate that a power law distribution can be generated from such a process [20]. More specifically, Zipf's Law necessarily emerges from the reduction over time of the number of possible states in which a system can be, as a consequence of symmetry breaking in random sampling processes [20] or from dependency structures in component systems [21]. The reduction space process explains some well-known domain-specific power law generation models, such as preferential attachment.
The Yule process is a generative model developed by G. Udny Yule to explain the power law distribution of the number of species in taxonomic groups [22]. It derives from simple rules. Firstly, starting from a pool of groups, the probability for a group to increase by one is proportional to the number of its elements. Secondly, at any speciation event, there is a possibility of generating a new species belonging to a brand-new taxonomic group. The model is a simplification of reality since it ignores extinction events. Nevertheless, it has been customized to explain the emergence of power laws in other systems such as city sizes [23] or paper citations [24].
Finally, other generative processes include highly optimized tolerance [25,26], the coherent noise model of biological extinction [27], the repeated fragmentation model of fixed length elements [1], the dynamic of times between records in a random process [28], the Hawkes processes [29] and out-from-criticality feedback [30]. A master equation for power laws has been previously obtained in the literature [31][32][33][34][35] but only considers stationary distributions or transient dynamics, while our results give a non-stationary solution [36]. Furthermore, our choice of transition rates has not been considered before.
To the best of our knowledge, the above examples and mechanisms for generating power laws do not generally address the question: how do power laws change (or are preserved) over time? Answering this question requires a way to specify the time evolution of the power law distribution. Furthermore, in many cases, empirical power laws have a cut-off point above which the relationship no longer holds [1], and this has not been captured by generative models, which are realistic only in specific value ranges. The next section provides a mechanism by which an equation dictating the time evolution of power laws can be obtained. In addition, the solution displays a natural cut-off beyond a certain time horizon.

Markov process models
Below we present an outline for a well-known derivation of the Fokker-Planck equation starting from a discrete Markov chain and taking a continuum limit. By analogy, we introduce a discrete Markov chain that has a power law stationary distribution and then we derive a master equation for power laws.

Fokker-Planck equation
The transition probabilities for a one-dimensional random walk are The probability for x jumps to the right and n − x jumps to the left is where b i , d i are the rates at which the process leaves state i to state i − 1, respectively to state i + 1. Furthermore, we define a i = d i − b i . Using the forward and backward difference operators Δx n = x n+1 − x n and rx n ¼ x n À x nÀ1 , the above equation can be written dp i ðtÞ dt ¼ ÀDða i p i ðtÞÞ þ Drðd i p i ðtÞÞ # @pðx, tÞ @t ¼ À @ @x ðaðxÞpðx, tÞÞ þ @ 2 @x 2 ðdðxÞpðx, tÞÞ,

Power law equation
We consider the Markov chain with the following transition matrix: The dominant contribution to the long-term equilibrium distribution is given by In figure 2a, we compare numerical solutions for the equilibrium distribution with the approximation (3.7), showing that the power law approximation is highly accurate. The Markov chain given by (3.6) is royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 220531 the embedded chain of the continuous-time Markov process in figure 1b. Using the same notation as in the previous section, the probability p i (t) satisfies the following differential equation: where a(x) = a e −k/x , b(x) = b (1 − e −k/x ) and a, b are constants. By Taylor expanding up to 1/x, we get a(x) ≃ a − (ak/x) and a 0 (x) + b(x) ≃ (bk/x). Without loss of generality, we can relabel bk as b and the equation reduces to @pðx, tÞ @t ¼ Àa @pðx, tÞ @x À b x pðx, tÞ: ð3:11Þ The term containing a/x can be neglected because the probability p(x, t) depends on a power of x (see below), and so each term on the right-hand side is of (3.11) is of the same order in x.
Let c(x, t) be the unnormalized solution of equation (3.11). The solution c(x, t) has a closed-form expression for x > 0 and x < at, where c(x, 0) = c 0 . In figure 2b, we plot the solution (3.12) for different values of b/a and see a power law behaviour over a range of scales that increases over time. The tail of the solution falls off abruptly just as we see in many real-world examples of power law distributions [1]. Equation (3.11) posits that cascades (consecutive transitions) occur over time in inverse proportion to their size. Furthermore, there is a constant rate a at which they can increase in size. For a finite time horizon t, there is a maximum size that a cascade can reach, namely at, as can be seen in the solution (3.12).
We can get additional insights from equation (3.10) if we consider the second order Taylor expansion of a 0 (x) + b(x) ≃ (bk/x) + ((2ak − bk 2 )/2x 2 ). If k = 2 a/b then the 1/x 2 term cancels, and this implies an exponent of 2 in the solution (3.12). This is not necessary in deriving equation (3.11) but can serve as a rationale for the experimental observation of power laws with exponent 2.

Urn model
Building on the insights of the previous section, we propose an urn model that displays power law behaviour. We can approximate the probability (3.8) to move from i to i + 1 state as for large enough i. Hence, if an adequate value of k is chosen, then we can expect to see a power law emerging even in a simple model. Brunk [3] proposes a model claiming to exhibit cascades that occur with frequencies inversely proportional to the cascade size, and provides graphs showing that a power law distribution emerges. We attempted to reproduce the results of the paper, but we obtained an exponential distribution instead of a power law. A power law relationship is seen to occur in a narrow range of scales that the graphs in the article use (figure 3a). Furthermore, we can analytically prove that the distribution that emerges is an exponential distribution and not a power law. The article [3] has been cited in recent years, and we consider it important to highlight the limitations of the model. Assume the urn contains N balls in total, with K balls black and the rest white. The model has the following steps: (i) A ball is drawn, if it is black then the next ball is drawn, and the process is repeated until a white ball is drawn. This constitutes a cascade within the model. Black balls are drawn without replacement. (ii) Each time a white ball is drawn the number of black balls is increased by a fraction g, while the number of white balls stay the same. If a white ball is drawn, then we repeat from step (i).
The probability that a cascade of size n occurs, i.e. n black balls are drawn followed by a white ball, is ðK À nÞ KÀnþ1=2 Á N À n À ðK À nÞ N À n royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 220531 If K & N then the first two term in the last row of (4.2) cancel to a large extent. If we take the logarithm of (4.2), we get log PðS ¼ nÞ % n log K À n N À n þ log 1 À K À n N À n % n logðfÞ þ logð1 À fÞ, ð4:3Þ where f = (K − n)/(N − n) ≃ K/N is approximately the fraction of black balls in the urn. Equation (4.3) shows that probability of a cascade is exponentially distributed, at least as long as n ≪ N. Furthermore, as K increases then f → 1, which implies an uniform distribution. We propose a different urn model. Let N 0 , K 0 and W be the initial number of balls, respectively, black and white ones. The procedure of the model is as follows: (i) Black balls are drawn with replacement. The number of black balls is increased by W/k after each draw. We continue to draw until a white ball is drawn. (ii) The process repeats with resetting the number of balls to their original values N = N 0 , K = K 0 .
The rank distribution of the cascade sizes is plotted in figure 3b for k = 1.5. As we can see, the graph is very well fit by a straight line, indicating that a genuine power law emerges across the entire range of cascade sizes.
At the ith extraction we have the total number of balls N i , of black balls K i and white balls W. Then The urn model is designed such that at each extraction the ratio of K i and N i is approximately given by ð4:5Þ Using the previous two equations we obtain that The number of black balls is increased after each draw by If a constant C number of black balls is added after each draw, then k = W/C.

Discussion
The relevance of this study relies upon two aspects: the increasing importance of understanding statistical laws in complex systems due to the availability of larger datasets [37], and the presence of power law distributions in various application domains [4,38,39]. The applicability of the models proposed in §3.2 is restricted to phenomena where time plays a key role in increasing cascade sizes. The more time passes from the initial observation, the greater the chances of seeing a larger cascade (which would place it at the right-end tail of the distribution at the time of observation). These phenomena can appear in a wide variety of disciplinary fields. The frequency versus the amplitude of earthquakes [1] and avalanches [40,41] follows a power law, as well as the distribution of the peak gamma-ray intensity of solar flares [42], and the size of moon craters per surface area [43]. The largest observed cascade in these examples depends on time, with more extreme examples being discovered the longer the observation record is.
The social sciences also offer numerous examples of power law distributions [44] that naturally evolve in time (but whose time dependence has not necessarily been quantified), such as the range of time between two deaths in serial killers' behavioural patterns [45], in narrative structure [46] or in the budget distribution of movies [47]. In economics, they mainly derive from aggregation and rich-get-richer processes. The existence of power laws in the distribution of wealth and income are known at least from the nineteenth century [48]. Notably, the work of Pareto was the first one to discover the presence of a power law distribution in a social system [38]. Power law distributions also appear in markets. For example, the frequency of firm size, royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 220531 measured by the number of employees, is distributed according to a power law [49]. Specifically, the exponent of this power law is approximately 1.06, which suggests there exists a 'Zip's Law' for firms [38].
Western popular music markets present power law distributions in the lifetime of albums [50]. Stock markets exhibited power law distributions since their very beginning [51]. It has been shown that the number of trades per day, the size of price movements when a large volume of shares are bought or sold, and the number of shares traded per time period are power law distributed with exponents 3, 0.5 and 1.5, respectively [52][53][54]. Inter-trading times also exhibit scaling properties consistent with power laws [55].
Another field in which power law distributions are pervasive is urban studies, where there are many scaling relationships between quantifiable city properties [56], such as between the number of gas stations and population levels [57], and the distribution of the population of cities in a given geographical area [58]. This latter distribution appears in different geographical areas [59] and depends on the granularity of the sampling [60]. Power law distributions also characterize digital infrastructure. The network analysis of Internet topology shows that at the end of the 1990s the degrees of the nodes were already power law distributed [61,62]. Also, the number of connections to a server in a single day, by a specific subsection of Internet users, follows a power law [63].
Furthermore, the academic system generates power laws. As first noted by Price, the number of citations of papers follows a power law distribution [64]. Similar features in academic systems can be observed in recent times, even in specific disciplinary areas. For instance, the cumulative number of citations over time for papers dealing with protein kinases is distributed as a power law. This is a consequence of a phenomenon called the Harlow-Knapp (H-K) effect, which is the propensity of the biomedical and pharmaceutical research communities to concentrate their research on a tiny fraction of the proteome [65,66].
In history, the well-known Richardson's Law states that the frequency of wars' sizes (measured in casualties) are power law distributed. This law is considered one of the few robust statistical regularities in studies of political conflict [67]. Nevertheless, some recent studies seem to cast doubt on Richardson's Law, or at least suggest having a greater caution regarding it [68]. Another example from history regards the Roman Empire [69], for which the survival time of its emperors is distributed according to a power law [70].
Certain power law relationships do not dependent on the observation period and the proposed models would not apply. For example, in physics, the well-known Stefan-Boltzmann Law describes a power law relationship between the total amount of energy radiated per time from a black body due to its temperature [71,72]. Furthermore, in particle physics and astrophysics there are numerous examples of power laws, from the Tully-Fisher relationship between a galaxy's luminosity and its rate of rotation [73], to the proportionality of the spin and mass of hadrons [74] and the density of eigenvalues of the Dirac operator in certain theories [75,76]. Other examples come from biology, where many works describe a scaling relationship between the basal metabolic rate and the body mass of animals [77][78][79], such as the well-known Kleiber's Law, which indicates that the basal metabolism of mammals increases according to m 3/4 where m is the overall body mass [80].
Another well-known example (with no time dependence) is Zipf's Law, which states that the rank frequency distribution of words in any sufficiently long text is distributed as a power law [81]. This law is independent of the language and holds even for artificial languages such as Esperanto [82]. While Zipf investigates specifically the distribution of words, a generalization of this law was later developed by Mandelbrot [83].
We end the discussion with a cautionary note on the discovery of power law in data and models. As is well noted in the literature [1], linear relationships in log-log plots in the frequency distributions of observations is insufficient to robustly establish that a power law holds. A more reliable test is to check if the linear relationship holds in log-log plots of rank distributions. As we have shown, the model proposed by Brunk [3] does not show power law behaviour. Another example where a power law is claimed to be observed is in the emergence of a scale-free network in systems with bounded rationality [84]. As other work has shown [85], the topology is not scale-free but rather of a core-periphery type.

Conclusion
In this paper, we have proposed a new mechanism to generate power laws based on Markov models. By analogy with the derivation of the Fokker-Planck equation, we have obtained a master equation for power laws. The result is supported by the fact that the underlying embedded Markov chain has an equilibrium distribution that is a power law and because a simplified version of the equation admits a closed form solution that is a power law.
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 220531 We proposed three models: a discrete-time and discrete-space Markov chain with transition probabilities (3.6), a continuous-time and discrete-space Markov chain shown in figure 2b and a Markov process (continuum in both time and space), with master equation given by (3.9). A simplified form of (3.9) is equation (3.11), which is valid in the limit of large cascade sizes. Further considerations indicate that an exponent of k = 2 gives additional cancellations independent of cascade size. This value of k is consistent with exponents observed in Richardson's Law [68] and in the net worth of Americans [1]. Nevertheless, equation (3.11) has a closed-form solution for any exponent value. In addition, the equation provides the time dynamics for the power law relationship along with a natural cut-off size depending on the time horizon.
The stationary solutions for the general equation are given by a first-order differential equation, whose solutions have been explored elsewhere [32]. However, a stationary solution to the power law equation might not always be well defined, as our time-dependent solution (3.12) for the simplified equation (3.11) shows (the time derivative is non-zero). The insight we get from the exact timedependent solution (3.5) of the Fokker-Planck equation is that the standard deviation of spatial displacements is proportional to the square root of time, which is characteristic of Brownian motion. Similarly, the time-dependent solution (3.12) gives the largest cascade size we can expect to see is linearly proportional to time.
Based on the insights from the Markov process, we propose a simple urn model that illustrates power law behaviour over the entire range of cascade sizes. The model only considers a constant addition of balls over time and despite the simplicity of the mechanism, a robust power law is observed to emerge. Our results are in contrast to a prior model by Brunk [3] which we prove does not show a genuine power law distribution. Finally, we discuss some possible applications of the proposed model to phenomena where power laws are observed and where the duration of the period of observation is important. As far as we know, all these contributions are novel to the literature.
Data accessibility. This article has no additional data. Authors' contributions. S.R.: conceptualization, formal analysis, investigation, methodology, project administration, visualization, writing-original draft, writing-review and editing; F.B.: data curation, validation, writing-original draft, writing-review and editing.
All authors gave final approval for publication and agreed to be held accountable for the work performed therein.